NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Large-scale evidence for logarithmic effects of word predictability on reading time

https://doi.org/10.1073/pnas.2307876121

Shain, Cory; Meister, Clara; Pimentel, Tiago; Cotterell, Ryan; Levy, Roger (March 2024, Proceedings of the National Academy of Sciences)

Full Text Available
Testing the Predictions of Surprisal Theory in 11 Languages

https://doi.org/10.1162/tacl_a_00612

Wilcox, Ethan G.; Pimentel, Tiago; Meister, Clara; Cotterell, Ryan; Levy, Roger P. (January 2023, Transactions of the Association for Computational Linguistics)

Abstract Surprisal theory posits that less-predictable words should take more time to process, with word predictability quantified as surprisal, i.e., negative log probability in context. While evidence supporting the predictions of surprisal theory has been replicated widely, much of it has focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times, (ii) whether expected surprisal, i.e., contextual entropy, is predictive of reading times, and (iii) whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to date between information theory and incremental language processing across languages.
more » « less
Full Text Available
A Cross-Linguistic Pressure for Uniform Information Density in Word Order

https://doi.org/10.1162/tacl_a_00589

Clark, Thomas Hikaru; Meister, Clara; Pimentel, Tiago; Hahn, Michael; Cotterell, Ryan; Futrell, Richard; Levy, Roger (January 2023, Transactions of the Association for Computational Linguistics)

Abstract While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: The uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages.1
more » « less
Full Text Available

Search for: All records